Pattern recognition method

ABSTRACT

In a machine implemented voice recognition method, as a first step speech signals are analyzed for feature vectors which are used to compare input signals with prestored reference signals. Patterns of any suitable form are used to calculate a similarity distance measure d IJ  which is tested against a threshold to select likely candidates as a first step. A second step selects the most likely candidate by using &#34;common nature&#34; parameters of phonemes such as relative occurrence. Five embodiments of the second step are disclosed, each using a &#34;common nature&#34; criteria of inference to infer (select) the most likely candidate: 
     (1) d&#39; I  =W 1 ,.W 2 .W 3  where W is a weighting factor; 
     (2) d&#34; I  =C I  d&#39; I  where C I  is a correction factor; 
     (3) max p(i,j) where p(i) is the probability of occurrence of the i th  phoneme; 
     (4) min d&#39; ij  as a variation of max p(i,j); and 
     (5) N(i) is the numerical similarity of the common characteristics of the selected candidates.

BACKGROUND OF THE INVENTION

1. Field of the Invention:

The present invention relates to a pattern recognition method and,particularly to an improved pattern recognition method which preciselyrecognizes confusing characters, and phoneme that constitutes voices andcorresponding to each symbol constituting a language.

2. Description of the Prior Art:

According to a conventional pattern recognition method such as a methodfor recognizing letters and voices, an input pattern and a standardpattern are subjected to comparison, and a pattern having a categoryname of the standard pattern having an optimum degree of identificationis introduced.

In recognizing the letters, when, for example, a Chinese character " "(large) is introduced, the comparison can be generally performed wellwith respect to the following Chinese characters " " (dog) or " "(thick), in addition to a standard pattern Chinese character " "(large). In recognizing voices, when, for example, the sound /t/ isintroduced, the comparison can be usually performed well with respect tothe same voiceless stop consonants such as /p/ or /k/ or with respect to/d/, /z/, or /s/ having the same place of articulation. Therefore, thereis a great probability for developing erroneous recognition among suchsimilar patterns, and the ability to perform accurate recognition isdecreased.

In recognizing phonemes, for example, in voice produced by a physicalphenomenon such as vibration of the vocal organs, the phonemes whichconstitute the voice produced under limited physical conditions such aslength of the vocal organs, may appear to be greatly affected by thepreceding or succeeding phoneme and the speed of speech.

Therefore, it is very difficult to precisely recognize the phoneme.

In order to overcome the above difficulty, a method was proposed,according to which a spoken word containing deformed phonemes wascompared as a practical recognition unit with a standard pattern.

According to the above method, however, it was necessary to preparestandard patterns of such large units as spoken words consisting of acombination of phonemes and, hence, it was necessary to store in thememory the standard patterns related to spoken words that were to berecognized. Since the memory of a tremendous capacity was necessary, itwas virtually impossible to construct a voice recognizing apparatuswhich is capable of recognizing any voices like a so-called voicetypewriter.

In order to recognize any voices, therefore, it becomes an essentialrequirement to perform the recognition on the phoneme level.

As mentioned above, however, the recognition on the phoneme levelpresents the following problems:

(1) It becomes difficult to perform the recognition as the phoneme isdeformed.

(2) A phoneme has a length considerably shorter than that of a word,which causes confusion among different phonemes.

(3) Voice is continuously produced with the passage of time, and it isnecessary to cut out the phoneme as a sectional pattern from thecontinuous voice pattern. It is, however, very difficult to properly cutout the sectional patterns.

With respect to the above-referenced third problem a system called thecontinuous DP (dynamic programming) matching method has been proposed inorder to continuously perform the matching of the introduced voicepattern with the standard pattern without the need of cutting thecontinuously produced voice pattern after a predetermined period oftime, and the effectiveness of the continuous DP matching method hasbeen confirmed. See Continuous Speech Recognition by Continuous DPMatching" by Ryuichi Oka, Technical Report of Acoustic Society of Japan,S78-20.

To cope with the above-referenced first and second problems, on theother hand, methods have been proposed in order to:

(i) Increase the kinds of characteristic parameters so that slightestdifferences among the phonemes can be detected;

(ii) Prepare standard patterns to emphasize consonant portions of thephonemes: and

(iii) Improve the matching method so that it is less affected by thedeformed phonemes.

None of the above methods, however, have produced satisfactory results.

SUMMARY OF THE INVENTION

The object of the present invention is to provide a pattern recognitionmethod which is capable of properly recognizing even confusing patternsbased upon the above-mentioned facts, i.e., to provide a patternrecognition method which eliminates the above-mentioned first and secondproblems and in recognizing phonemes in order to enhance the recognitionfactor of the voice patterns.

In order to accomplish the above object, according to the presentinvention, the standard pattern of the highest certainty obtained by thematching of an unknown pattern with the standard pattern, is decided byutilizing the matching results of other standard patterns inclusive ofresembling patterns as recognized information, in order to reduceerroneous recognition and to increase the recognition factor.

In accordance with the method of the invention, an input pattern iscompared with standard patterns to produce identified values of eachcomparison of the input pattern with the standard patterns, a pluralityof candidates are selected that are likely to be the input pattern basedupon the identified values; and an input pattern is inferred based upona predetermined criterion of inference. The predetermined criterion ofinference is different than the criteria for selecting the plurality ofcandidates and utilizes the nature of the selected candidates and thecommonness of each of the selected candidates with the other candidates.In accordance with the invention, there are four preferred methods fordetermining the criterion of inference.

The principle of the present invention will be described below withreference to phoneme recognition based upon the pattern matching method.

In general, phonemes are not totally unrelated to each other, and thereare predetermined relationship among the phonemes. Therefore, thephonemes can be classified into several groups depending upon theircommon natures. According to the above classifications, the phonemesbelong to several groups depending upon the natures. According to theresults of recognition experiments conducted by the inventors of thepresent invention, the following facts were ascertained:

(a) A distance obtained by comparing a phoneme group having a commonnature with the standard pattern is smaller than a distance obtained bycomparing a phoneme group without a common nature with the standardpattern.

(b) Since each phoneme has a small amount of information, even a slightdeformation causes the distance which is the result of the comparison tobe greatly varied. There is, however, a predetermined upper limit in thedistance, and the distance seldom varies in excess of the upper limit.

(c) When priority is given to the phonemes depending upon theirdistances such that the phoneme having a minimum distance as a result ofthe comparison is entitled to the first order in certainty, the phonemeshaving the highest order of certainty have, in many cases, a commonnature to the phonemes that pertain to the same category, even when theorder of phonemes pertaining to the category which is the same as thestandard pattern is reversed relative to the order of phonemes thatpertain to a different category. Conversely, the phonemes without acommon nature often have small orders in certainty.

Relying upon these facts, the fundamental principle of the presentinvention consists of classifying the phonemes having higher orders incertainty as determined by the comparison into a plurality of groupsdepending upon their common natures, and specifying the phonemes thatcommonly belong to these groups as the input phonemes.

In this case, it is possible to increase the precision of recognitiondepending upon whether the phonemes having less commonness to otherphonemes are located at higher positions in certainty or not.

What should be set and how it should be set as a common nature forclassifying the phonemes will differ depending upon the characteristicparameters employed for the recognition and the language beingdiscussed. However, a relatively stable classification is realized basedupon the following natures:

(1) Place of articulation,

(2) Manner of production.

However, the manner of production of the sound of the [g] series ofJapanese language may be either /g/ (voiced stop consonant) or /η/(nasal consonant). Therefore, the classification based upon theabove-mentioned nature is not satisfactory.

In specifically constructing an apparatus according to the invention,therefore, the phonemes should be classified depending upon the naturewhich is determined based upon a language or representative parameters.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram showing an example of results obtained byclassifying the candidates of recognition depending upon their commonnatures;

FIG. 2 is a diagram illustrating quantities that represent similaritybetween the phonemes in the input patterns and the phonemes in thestandard patterns as well as correction quantities for the phonemes inthe input patterns;

FIG. 3 is a diagram showing an example of results of recognition by thefirst and second methods of the present invention;

FIG. 4 is a block diagram showing the principle of a pattern recognitionapparatus according to a third method of the present invention;

FIG. 5 is a diagram showing an example of average similarity between aninput pattern (i) and a standard pattern (j);

FIG. 6 is a block diagram of a voice recognition apparatus according toan embodiment of the present invention;

FIG. 7 is a diagram showing a flow chart for checking phonemes accordingto the first and second methods of the present invention; and

FIG. 8 is a flow chart for checking phonemes according to the thirdmethod of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Embodiments of the invention will be described below in detail withreference to specific data.

First, a registered unit of a standard pattern is set to bevowel--consonant--vowel (a so-called VCV unit). This unit, however, neednot be limited to the VCV unit provided it is lower than a level oflinguistic signs of voices such as syllables and phonemes.

If now a word (/atataka/) is fed as an input voice, there will exist thefollowing distances from the first place to the sixth place as theresult of comparing with various VCV's that are prepared as standardpatterns for recognizing the second underlined consonant /t/. ##EQU1##

From the above results, the consonant in the input voice according to aconventional method will be erroneously recognized as /k/ of ○1 whichgives a minimum distance. The present invention provide a method whichprecludes the above defect, and extracts a first candidate /t/ as thecorrect answer from /ata/ which is in the fourth place from theviewpoint of distance.

According to the results of a recognition experiment conducted by theinventors of the present invention, the distance in the VCV that may bea correct answer does not become greater than a minimum distance in allVCV's by more than 0.2, when the sampling frequency of the input voiceis 8 KHz, the Hamming window in the continuous non-linear matching(usually referred to as DP matching) is 20 msec., and the frame distanceis 10 msec. In the above-mentioned example, based upon this result,VCV's (six distances ○1 to ○6 in the relation (1)) serve as candidatesof recognition having distances smaller than,

    1.53+0.3=1.83

which is not greater, by more than +0.3, than a minimum distance 1.53(distance ○1 in the relation (1)).

According to the first method of the present invention, consonants(including consonant /t/ of correct answer) in the six VCV's extractedas candidates of recognition are examined for their commonness.

Therefore, the following facts can be understood.

○i The /k/ and /p/ which are voiceless stop consonants, are in agreementwith each other in their manner of production, and belong to the samegroup.

○ii The /d/, /z/ and /s/ have a point of articulation at the tip oftongue, and are in agreement with each other in regard to their place ofarticulation, and belong to the same group.

FIG. 1 shows six consonants which are candidates from the viewpoint ofthe manner of production and the place of articulation, consonants whichcan be classified into the same group, and the total number (N) in eachgroup.

According to FIG. 1, there are the greatest number of consonants thatcan be classified into the same group as the consonant /t/ of thecorrect answer. There are two consonants from the viewpoint of themanner of production, and three consonants from the viewpoint of theplace of articulation. The total number N inclusive of /t/ is 6.

Therefore, if the voice which is introduced is inferred with themagnitude of N as a criterion for inference, it is possible to obtain acorrectly recognized result.

Next, in order to enhance the precision of recognition, new distancesreflecting the classified results of FIG. 1 are found from the distancesthat are obtained by the comparison, and voices that are introduced areinferred with the thus found distances as criteria for inference.

Referring to the relation (1), if a distance of the i-th order isdenoted by d_(i), a minimal value among d₁ to d₆ is denoted by d_(min)(1.53 of /aka/), the number of consonants of the i-th order that pertainto the same group of FIG. 1 by N_(i), and distances of VCV'scorresponding to N_(i) consonants by d_(ij) (j=1, 2 . . . N_(i)) (in thecase of /k/, for example, 1.53 of d₁₁ =/aka/, 1.64 of d₁₂ =/ata/, and1.65 of d₁₃ =/apa/ when i=1 and N₁ =3), the following new distance d₁ 'can be defined responsive to the distance of the i-th order of therelation (1).

    d.sub.1 '=w.sub.1 ·w.sub.2 ·w.sub.3      (2)

Here, w₁ denotes a weighing quantity which represents increased resultof recognition with the increase in the number of consonants thatpertain to the same group. For instance,

    w.sub.1 =1/N.sub.i                                         (3)

Symbol w₂ denotes a weighing quantity which represents increased resultof recognition with the decrease in the distances that are results ofcomparisons. For instance,

    w.sub.2 =1+d.sub.i -d.sub.min                              (4)

Symbol w₃ denotes a weighing quantity which represents an increasedresult of recognition with the decrease of distances that are results ofcomparisons relative to VCV's that pertain to the same group. Forinstance, ##EQU2##

The distance d_(i) ' (i=1, 2, . . . 6) of the equation (2) is calculatedusing weighing quantities w₁ to w₃ given by the equations (3) to (5),and are indicated as follows in the order corresponding to ○1 to ○6 ofthe equation (1). ##EQU3##

The distance d₄ ' corresponding to /ata/ that serves as a correctrecognition result assumes a minimal value 0.30. This verifies theeffectiveness of the first method of the present invention.

According to the results of a recognition experiment conducted by theinventors of the present invention, the recognition factor of 95% can beachieved by using the distance d_(i) ' of the present invention comparedwith the recognition factor of 78% of the conventional method.

In the above description, it was presumed that the number of VCV'sbelonging to the same group is nearly equal in all of the VCV's. SomeVCV's, however, may belong to the same group in reduced numbers.

With regard to such VCV's, the weight (w₁ of the equation (3)) based onthe number of VCV's belonging to the group is modifed and is balanced,or the modification is effected depending upon whether there is anycandidate having a different nature among those classified into the samegroup as candidates of recognition. As for the candidate having adifferent nature, the weighing quantity corresponding to the equations(3) to (5) and the distance d_(i) " corresponding to d_(i) ' of theequation (2) are found depending upon the nature of the candidate, andthe modification is effected depending upon the ratio d_(i) '/d_(i) ".

If now the likelihoodration is used, the VCV close to the averagespectral characteristics tends to appear as a candidate of recognitionfor various VCV's and also loses the likelihoodration valuecorrespondingly. However, since the VCV having a great deviation featureappears as a candidate only for specific groups, it is possible tomodify the distance d_(i) beforehand by utilizing the above-mentionednature.

The above description has dealt with the method in which the degree ofcommonness is expressed in two steps, i.e., "1" (common) or "0" (notcommon), and the consonant /k/ of FIG. 1 has commoness to consonants /t/and /p/ in regard to the manner of production and, hence, has asimilarity degree 1, and has no commonness to other consonants /d/, /z/or /s/ in regard to either the manner of production or the place ofarticulation and, hence, has a similarity degree 0. In other words, theabove description has dealt with the method which equally handles theobjects of recognition that belong to the same group relying upon thecommon nature. Below is mentioned a second method according to thepresent invention, in which the common nature is expressed by anynumerical value between 0 and 1 depending upon the degree of commonnessto fairly evaluate the commonness among the phonemes, and to correct thedeviation in the number of similar phonemes.

First, the similarity degrees P_(IJ) between the phonemes I in the inputvoices that are to be recognized and the phonemes J in the standardpatterns, are found and are tabulated. The similarity degrees P_(IJ) maybe prepared relying upon the quantities phonemically defined based oncommon terms of discriminated features, or may be prepared utilizing theresults of checking in the apparatus for recognizing the voice.

FIG. 2 tabulates specific examples of quantities corresponding to thesimilarity degree P_(IJ). In this case, when I=J is denoted by 1, valueswithin a range of 0 to 1 are rounded to 0.0, 0.2, 0.4, 0.6, 0.8 or 1.0,and the results are multiplied by 100.

The similarity degree P_(IJ) is a quantity which represents the degreeof similarity between I and J. Therefore, (1-P_(IJ)) can be regarded asa quantity which represents the degree of non-similarity between I andJ.

The unknown voice which is introduced is now denoted by I, and ismatched to the standard pattern J to utilize L distances that have thegreatest similarities (in the following description, the similarity isdefined by the distance d_(IJ), the smaller the distance d_(IJ) thegreater the similarity), i.e., to utilize L distances that lie inside apredetermined threshold value. If these distances are denoted as followsin the order of increasing quantities,

    d.sub.I1, d.sub.I2, d.sub.I3, . . . , d.sub.IL             (7)

the unknown voice I which is introduced will be specified as the oneamong 1 to L.

In inferring that the unknown voice is I based upon these quantities,the precision of inference can be increased through the followingprocessing.

First, if ##EQU4## is calculated, S_(I) becomes a quantity thatindicates a degree which does not mean that the input voice is I.

Moreover, the distance d_(IJ) which is increased serves as a quantitythat indicates an increasing degree at which I is not J.

Therefore, if S_(I) and d_(IJ) are combined together to define. ##EQU5##it is considered that d_(I) ' becomes a quantity that indicates a degreeat which the unknown voice is not I. By using this quantity as acriterion of inference, it is possible to infer the voice to be I_(O)when,

    d.sub.IO '=M.sub.in [d.sub.1 ', d.sub.2 ', d.sub.3 ', . . . d.sub.L ']

The distance d_(I) ' calculated according to the equation (9)corresponds to d_(i) ' of the equation (2). When the weighing quantityw₃ of the equation (2) is found, however, the distances,

    d.sub.i1, d.sub.i2, d.sub.i3, . . . d.sub.iNi

which are the candidates are all equally treated as given by theequation (5).

According to the equation (9), on the other hand, the weighing(1-P_(IJ)) is effected for all of the candidate distances,

    d.sub.I1, d.sub.I2, d.sub.I3, . . . , d.sub.IL

depending upon the similarity between I and J (J=1, 2, . . . , L) tofind the distance d_(I) which is weight averaged. Therefore, it ispossible to find a distance which more faithfully reflects the distancerelative to the standard pattern.

In the case of the input voice I having small number of similarphonemes, the number of candidates L is small as given by the equation(7), and the distance d_(I) ' is generally large, making it difficult toperform correct recognition.

To correct this, a correction coefficient C_(I) for the distance d_(I) 'is introduced to define. ##EQU6## and using the above quantity as acriterion of inference, the voice is inferred to be I_(O) based upon arelation,

    d.sub.IO "=M.sub.in [d.sub.1 ", d.sub.2 ", d.sub.3 ", . . . d.sub.L "]

For example, the correction coefficient C_(I) is calculated as follows(numerical values are specifically shown in the bottom row of FIG. 2)based upon P_(IJ) that corresponds to 1/100 of the numerical values ofFIG. 2, ##EQU7## where M denotes the total number of the standardpatterns which are prepared.

In the case of the phonemes having large C_(I) values, there exist a lotof similar phonemes, and the distance d_(I) ' of the equation (9) tendsto become small. Therefore, use of the distance d_(I) " corrected byC_(I) enables the phonemes to be fairly recognized.

According to the recognition experiments conducted by the inventors ofthe present invention, nine objects were erroneously recognized amongabout 100 objects when the distance d_(IJ) was employed. When thedistance d_(I) ' was employed, four objects were erroneously recognized.Further, when the distance d_(I) " was employed, only one object waserroneously recognized.

FIG. 3 shows the results of recognition using the distances d_(I) ' andd_(I) " for the four consonants of which the distance d_(IJ) usuallyranges from the first order to the fourth order from the smaller side incase the input voice to be recognized is a consonant /s/.

In FIG. 3, the consonant is correctly recognized as /s/ when d_(I) " isused, even though it may be erroneously recognized as /t/ or /z/ whend_(IJ) or d_(I) ' is used.

According to the above two methods, part of the standard patternprepared based upon the compares values is selected as a candidate forrecognition, and an unknown pattern is inferred from the candidatesrelying upon a predetermined criterion of inference.

A third method of the present invention will be described below, using acriterion of inference extracted from the combined information of inputpattern and a plurality of standard patterns.

If an input pattern is denoted by i, a standard pattern by j, a degreeof similarity corresponding to a compared value of the input pattern iand the standard pattern j by d_(i),j, the appearing probability of theinput pattern i by p(i), the probability in which the similarity degreebetween the input pattern i and the standard pattern j is d_(i),j byp(d_(i/j) /i, j), the probability in which the input pattern is i whenthe similarity degree is d_(i),j by p(i|d_(i),j), and the probability inwhich the input pattern i is compared with the standard pattern j isdenoted by p(i, j), the comparison of the input pattern i with thestandard pattern j indicates that the probability p(i|i, j) in which theinput pattern i comes into agreement with the standard pattern j, isgiven by

    p(i|i, j)=p(i)·p(i,j)·p(d.sub.i,j |i,j)·p(i|d.sub.i,j)           (12)

According to the conventional method, j is presumed to be equal to i,and the input pattern is specified by i which satisfies. ##EQU8##

According to the third method of the present invention, on the otherhand, the input pattern is specified by i which maximizes a relation,##EQU9## where N denotes the total number of standard patterns, using##EQU10## as a criterion of inference.

The probability p(i) can be statistically determined from thedistribution of patterns. For example, the phonemes of the JapaneseLanguage can be recognized by utilizing the results of investigationconcerning the frequency of phonemes.

When all of the standard patterns and input patterns are compared, p(i,j)=1/N. The probability p(d_(i),j |i,j) and the probability p(i|d_(i),j)can be determined by defining the practical characteristic parametersand similarity degrees, and by observing the distribution of the data,correspondingly. The distribution of d_(ij) differs depending upon theparameters and the similarity degree. When i=j, in particular, thedistribution often becomes asymmetrical with respect to an average valued_(ij) of d_(ij). In many cases, however, the distribution issymmetrical and can be approximated by the normal distribution.Therefore, it is virtually convenient to normalize the distribution witha dispersion σ_(i),j to treat it as a function of ##EQU11## Therefore,if

    p(d.sub.i,j |i,j)·p(i|d.sub.i,j)

is approximated with the normal distribution like, ##EQU12## the valueof the equation (15) increases with the decrease in δ_(i), j. Therefore,the object which takes the sum of the equation (14) may be limited tothe number n of combinations of i and j having a small value δ_(ij) (inthis case, the equation (14) is treated with regard to values n smallerthan the total number N). When the likelihoodration or a square distanceis to be used as a similarity degree, a value among patterns havingsmall similarity undergoes great change even for a slight change in thepatterns, and becomes unstable. Due to this unstability factor,therefore, the value σ_(ij) becomes great and an apparent value δ_(ij)becomes small. In such a case, the objects which assume the sum of theequation (14) are not simply limited to those having small value δ_(ij)but the value d_(ij) itself is limited to those having increasedcertainty (or having small likelihoodration or distance). Even in thiscase, the equation (14) is executed for the output that corresponds to nstandard patterns having values smaller than the total number N.Thereafter, the total number N includes the meaning of n of such ameaning.

Accordingly, it is possible to specify the input pattern using i whichapproximately assumes, ##EQU13## instead of the equation (14).Furthermore, if ##EQU14## and the equation (16) is given by, ##EQU15##there is no need of effecting the division.

Discussed below is a modification method based upon the idea of amatching method according to the above-mentioned third method utilizingthe information consisting of a combination of i and j. The equation(17) is modified as follows: ##EQU16## where w denotes the weight, anda_(ij) and c_(O) denote constants.

Here, a_(ij) is defined as follows:

    a.sub.ij =c.sub.ij -c.sub.O                                (19)

with the average value of d_(ij) as c_(ij) (c_(ij) =d_(ij)). Theconstant c_(O) is so determined that d_(ij) does not usually becomegreater than it when the input pattern i and the standard pattern j havecommonness with regard to some nature, and that d_(ij) does not becomesmaller than it when the input pattern i and the standard pattern j donot have commonness. If the constant c_(O) is determined as mentionedabove, a_(ij) (c_(O) -d_(ij)) in the equation (18) assumes a negativevalue in most cases when the input pattern i and the standard pattern jhave commonness in regard to some nature, and assumes a positive valuein most cases when there is no commonness between i and j. Therefore,the second term of the equation (18), i.e., ##EQU17## works to correctthe result d_(ij) of the j-th matching portion depending upon the degreeof commonness to the result d_(ij) of other matching portions. Inparticular cases, it is allowable to set that a_(ij) =0. In this case,operation for the correction term for the combination can be eliminatedto reduce the quantity of operation. When the phonemic commonness isvery small, the value d_(ij) will often become unstable. For suchcombinations, therefore, the value a_(ij) should be set to 0 beforehandto obtain stable results. Further, the value d_(ij) which is greaterthan a predetermined level will not be reliable. Therefore, it is betternot to use the term thereof.

Described below is a further specific illustration of the principle ofthe third method when it is adapted for recognizing voices, particularlyfor recognizing phonemes in continuous voice.

FIG. 4 is a block diagram of the apparatus for recognizing voice basedupon the above-mentioned principle. FIG. 4 principally illustrates amatching portion which executes the operation of the equation (14) toillustrate the principle of the third method of the present invention,and shows the flow of signals. The input voice 1 is converted intocharacteristic parameters through an analyzing circuit 2, and is sent toidentifying circuits 3-1 to 3-N for checking with standard patternmemories 4-1 to 4-N of each of the phonemes. Results 5-1 to 5-N ofchecking or identification with the phonemes are sent to matchingcircuits 6-1 to 6-N. Utilizing the results 5-1 to 5-N of checking withthe phonemes, matching circuits 6-1 to 6-N perform calculationscorresponding to each of the terms of the equation (14), whereby results7-1 to 7-N are sent to a discriminating circuit 8. The discriminatingcircuit 8 compares the results, discriminates the phoneme having thehighest degree of certainty, and produces a signal 9.

A first system in the third method based upon the equation (4) isillustrated below.

Likelihoodration of the tenth order in used as the degree of similarity.

First, the registered unit of a standard pattern consists ofvowel--consonant--vowel (a so-called VCV unit). This unit need not belimited to the VCV unit provided it is lower than a level of linguisticsigns of voices such as syllable or phoneme.

According to the results of recognition experiments conducted by theinventors of the present invention, a distance in the VCV that is acorrect answer does not become greater than a minimum distance in all ofthe candidate VCV's by more than 0.2, when the sampling frequency of theinput voice is 8 KHz, the Hamming window in a continuous non-linearmatching (usually called continuously DP matching) using the dynamicprogramming method is 20 msec, and the distance among the frames is 10msec. Further, the distance seldom exceeds 2.0 in the VCV that serves asa correct answer. When 2.0 is exceeded, the distance should be rejectedas it stems from unstable inputs. Therefore, the d_(ij) which is notgreater than those having the greatest certainty by more than 0.4 andwhich is smaller than 2.0, is used. Below are described the resultsd_(ij) produced by the identifying circuits 3-1 to 3-N for /k/ after theinput voice /Kagakuhooteishiki/.

First place: /g/ 1.634

Second place: /k/ 1.774

Third place: /b/ 1.910

Fourth place: /p/ 1.927

In the equation (17), if a value d_(ij) is measured as shown in FIG. 5,and if the dispersion σ_(ij) is presumed to be 1, then,

First place: /k/ 0.847/4

Second place: /p/ 1.433/4

Third place: /b/ 2.237/4

Fourth place: /g/ 3.067/4

Thus, /k/ becomes the first place.

Below is mentioned a modified method based on the equation (18) as asecond embodiment of the third method.

When,

First place: /g/ 1.634

Second place: /k/ 1.774

Third place: /b/ 1.910

Fourth place: /p/ 1.927

if C_(O) =2.2, W=1.0, and C_(ij) is given as shown in FIG. 5, d_(ij) 'after being corrected becomes:

First place: /k/ 1.672

Second place: /g/ 1.839

Third place: /p/ 1.927

Fourth place: /b/ 1.997

and the correct answer /k/ takes the first place.

Below is mentioned an apparatus for recognizing the voice according tothe present invention with reference to the situation when the voice isto be recognized, particularly when the phoneme in the continuous voiceis to be recognized.

FIG. 6 is a block diagram of an apparatus for recognizing the voiceaccording to an embodiment of the present invention.

In FIG. 6, an input voice 61 passes through a lowpass filter (LPF) 62for preventing aliasing noise, and is converted into digital signalsthrough an analog-to-digital converter (ADC) 63. Then, a conventionalcharacteristic parameter analyzing circuit 64 produces a frame dataconsisting of a short-term autocorrelation [v_(i) ] and a residual powerP_(O) as a characteristic parameter after every interval of one frame(for example, 10 msec.).

Likelihoodration which represents the similarity between a series offrame data and a series of frame data of standard patterns stored in astandard pattern memory 66, is calculated by a likelihoodrationcalculating circuit 65.

Based upon the thus calculated likelihoodration, an optimum identifiedvalue is processed by a conventional continuous DP matching circuit 67via an intermediate result memory 68, thereby to calculate the distance[d_(IJ) ].

The distance [d_(IJ) (J=1, 2, . . . )] is fed to a phoneme identifiedvalue processing circuit 600 via a buffer 69 where the recognitionprocessing is carried out according to the method of the presentinvention, and a final result 610 of the processing of phonemerecognition is produced.

Here, and phoneme identified value processing circuit 600 may be made upof an ordinarily used microprocessor. When the first and second methodsof the present invention are to be carried out using the microprocessor,however, portions surrounded by a dotted line are executed as shown inthe flow chart of FIG. 7. Further, when the third method of the presentinvention is to be performed, the processing is carried out as shown ina flow chart of FIG. 8.

The foregoing description has employed likelihoodration as a scale formeasuring the similarity. Therefore, the circuits subsequent to thecontinuous DP matching circuit 67 in FIG. 6 perform such a processingthat the certainty increases with the decrease in the value. The samealso holds true even when the distance is used as a scale for measuringthe similarity.

When the correlation is to be used, however, the processing must becarried out in a way that the certainty increases with the increase inthe value. For example, the reliability must be increased with theincrease in the weighing quantities w₁, w₂ and w₃ in the equation (2).The present invention naturally includes these modifications.

According to the present invention as illustrated in the foregoing, thevoice such as phonemes can be stably and precisely recognized on a levellower than a linguistic level of signs, presenting great effects.

What is claimed is:
 1. A machine implemented pattern recognition methodcomprising comparing input patterns with standard patterns, selecting aplurality of candidates that are likely to be input patterns accordingto identified values that represent the results of the comparison, andinferring an input pattern among the selected candidates that are likelyto be input patterns according to a predetermined criterion of inferencewhich is determined from the commonness of the nature of each of theselected candidates with the other selected candidates.
 2. A machineimplemented pattern recognition method according to claim 1, whereinsaid criteron of inference is a number of candidates having a naturecommon to that of said selected candidates, and a candidate having thegreatest number of said candidates having a nature common to that ofsaid selected candidate is inferred as the input pattern.
 3. A patternrecognition method according to claim 1, wherein said criterion ofinference is a product of a value which corresponds to an inverse numberof the candidates having a nature common to that of said selectedcandidates, a value corresponding to said identified value of each ofthe candidates, and an average value of said identified value in each ofthe candidates and in the candidates having a nature common to said eachof the candidates.
 4. A pattern recognition method according to claim 1,wherein said criterion of inference is a value corresponding to aweighed average value of a similarity degree and an identified valuebetween said selected candidates and candidates having a nature commonto said candidates.
 5. A pattern recognition method according to claim1, wherein said criterion of inference assumes a quantity given by##EQU18## where p(i) denotes an appearing probability of the inputpattern i (i=1, 2, . . . N), p(d_(i),j |i,j) denotes a probability inwhich a quantity corresponding to the similarity degree between theinput pattern i and the standard pattern j (j=1, 2, . . . N) is d_(i),j,p (i|d_(i),j) denotes a probability in which the input pattern is i whenthe quantity corresponding to said similarity degree is d_(i),j, and p(i,j) denotes a probability in which an input pattern i is checked witha standard pattern j.
 6. A machine implemented pattern recognitionmethod for inferring an input pattern comprising the steps:comparing aninput pattern with a plurality of standard patterns; selecting aplurality of candidates that are likely to be the input pattern basedupon identified values that represent the results of the comparison ofthe input pattern with the standard patterns; and inferring an inputpattern from the plurality of candidates based upon a predeterminedcriterion of inference for evaluating the selected plurality ofcandidates, the predetermined criterion of inference being differentthan the criteria for selecting the plurality of candidates, andutilizing at least one characteristic parameter of each of the selectedplurality of candidates and the commonness of at least onecharacteristic parameter within each selected candidate and the otherremaining selected candidates.
 7. A machine implemented patternrecognition method in accordance with claim 6, wherein said criterion ofinference is determined for each of said selected candidates bycalculating the number of candidates having a characteristic parametercommon to each selected candidate and the input pattern is inferred bychosing the candidate having the greatest calculated number.
 8. Amachine implemented pattern recognition method according to claim 6,wherein said criterion of inference is a product of a value whichcorresponds to an inverse number of the candidates having acharacteristic parameter common to that of said selected candidates, avalue corresponding to said identified value of each of the candidates,and an average value of said identified value in each of the candidatesand in the candidates having a characteristic parameter common to saideach of the candidates.
 9. A machine implemented pattern recognitionmethod according to claim 6, wherein said criterion of inference is avalue corresponding to a weighed average value of a similarity degreeand an identified value between said selected candidates and candidateshaving a characteristic parameter common to said candidates.
 10. Amachine implemented pattern recognition method according to claim 6,wherein said criterion of inference assumes a quantity given by##EQU19## where p(i) denotes an appearing probability of the inputpattern i (i=1, 2, . . . N), p(d_(i),j |i,j) denotes a probability inwhich a quantity corresponding to the similarity degree between theinput pattern i and the standard pattern j (j=1, 2, . . . N) is d_(i),j,p (i|d_(i),j) denotes a probability in which the input pattern is i whenthe quantity corresponding to said similarity degree is d_(i),j, and p(i,j) denotes a probability in which an input pattern i is checked witha standard pattern j.
 11. A voice pattern machine implemented methodcomprising: a first step of comparing an unknown input pattern withprestored standard voice patterns using a distance measure to findprobable candidates and a second step of inferring the unknown patternfrom the plurality of candidates based on a predetermined criterion ofinference relying upon the common nature of the selected candidates withother candidates, wherein said criterion of inference is determined foreach of said selected candidates by calculating the number of candidateshaving a characteristic parameter common to each selected candidate andthe input pattern is inferred by choosing the candidate having thegreatest calculated number.
 12. A voice pattern machine implementedmethod comprising: a first step of comparing an unknown input patternwith prestored standard voice patterns using a distance measure to findprobable candidates and a second step of inferring the unknown patternfrom the plurality of candidates based on a predetermined criteron ofinference relying upon the common nature of the selected candidates withother candidates, wherein said criteron of inference is a product of avalue which corresponds to an inverse number of the candidates having acharacteristic parameter common to that of said selected candidates, avalue corresponding to said identified value of each of the candidates,and an average value of said identified value in each of the candidatesand in the candidates having a characteristic parameter common to saideach of the candidates.
 13. A voice pattern machine implemented methodcomprising: a first step of comparing an unknown input pattern withprestored standard voice patterns using a distance measure to findprobable candidates and a second step of inferring the unknown patternfrom the plurality of candidates based on a predetermined criterion ofinference relying upon the common nature of the selected candidates withother candidates, wherein said criterion of inference is a valuecorresponding to a weighed average value of a similarity degree and anidentified value between said selected candidates and candidates havinga characteristic parameter common to said candidates.
 14. A voicepattern machine implemented method comprising: a first step of comparingan unknown input pattern with prestored standard voice patterns using adistance measure to find probable candidates and a second step ofinferring the unknown pattern from the plurality of candidates based ona predetermined criterion of inference relying upon the common nature ofthe selected candidates with other candidates, wherein said criterion ofinference assumes a quantity given by ##EQU20## where p(i) denotes anappearing probability of the input pattern i (i=1, 2, . . . N),p(d_(i),j /i,j) denotes a probability in which a quantity correspondingto the similarity degree between the input pattern i and the standardpattern j (j=1, 2, . . . N) is d_(i),j, p (i/d_(i),j) denotes aprobability in which the input pattern is i when the quantitycorresponding to said similarity degree is d_(i),j, and p (i,j) denotesa probability in which an input pattern i is checked with a standardpattern j.